Goto

Collaborating Authors

 molecular science


Automated Molecular Concept Generation and Labeling with Large Language Models

Zhang, Shichang, Xia, Botao, Zhang, Zimin, Wu, Qianli, Sun, Fang, Hu, Ziniu, Sun, Yizhou

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is significantly transforming scientific research. Explainable AI methods, such as concept-based models (CMs), are promising for driving new scientific discoveries because they make predictions based on meaningful concepts and offer insights into the prediction process. In molecular science, however, explainable CMs are not as common compared to black-box models like Graph Neural Networks (GNNs), primarily due to their requirement for predefined concepts and manual label for each instance, which demand domain knowledge and can be labor-intensive. This paper introduces a novel framework for Automated Molecular Concept (AutoMolCo) generation and labeling. AutoMolCo leverages the knowledge in Large Language Models (LLMs) to automatically generate predictive molecular concepts and label them for each molecule. Such procedures are repeated through iterative interactions with LLMs to refine concepts, enabling simple linear models on the refined concepts to outperform GNNs and LLM in-context learning on several benchmarks. The whole AutoMolCo framework is automated without any human knowledge inputs in either concept generation, labeling, or refinement, thereby surpassing the limitations of extant CMs while maintaining their explainability and allowing easy intervention. Through systematic experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets, we demonstrate that the AutoMolCo-induced explainable CMs are beneficial and promising for molecular science research.


GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text

Liu, Pengfei, Ren, Yiming, Tao, Jun, Ren, Zhixiang

arXiv.org Artificial Intelligence

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.


#ICML2021 invited talk round-up 2: randomized controlled trials, encoding speech, and molecular science

AIHub

In this post, we summarise the final three invited talks from the International Conference on Machine Learning (ICML). These presentations covered: how machine learning can complement randomised controlled trials, encoding and decoding speech, and molecular science. Esther's work centres on the use of randomised controlled trials (RCT) and she runs policy experiments with the aim of understanding which policies work and which don't. Her work is particularly focussed on reducing poverty. Work of this type involves many causal questions, for which there are often many competing ideas. Such is the field that there is no real guidance for theory; experiments are needed to determine successful policies.

  experiment, molecular science, speech, (14 more...)

Caris Life Sciences Showcases Results from Novel Machine Learning Approach to Classify Cancer by Molecular Signatures

#artificialintelligence

Caris Life Sciences, a leading innovator in molecular science focused on fulfilling the promise of precision medicine, today presented a poster demonstrating how its advanced machine learning approach, Caris Next Generation Profiling, enables a proprietary algorithm to molecularly classify tumor samples into cancer types. These results, presented at the 2019 American Society of Clinical Oncology (ASCO) Annual Meeting, showcase how analysis of large combined molecular and clinical datasets can improve diagnosis of challenging cases, which is expected to inform increasingly personalized and precise cancer treatments. The poster, "Machine Learning Algorithm Analysis using a Commercial 592-gene NGS Panel to Accurately Predict Tumor Lineage for Carcinoma of Unknown Primary (CUP)," was presented at this morning's Developmental Therapeutics and Tumor Biology (Nonimmuno) Poster Session. Caris scientists detailed how Caris Next Generation Profiling identified molecular classifications for tumor samples with over 95% accuracy using next generation sequencing (NGS) data from 55,780 tumor patients. It generated an unequivocal result in the vast majority of cases of carcinoma of unknown primary (CUP), when there was ambiguity about tissue of origin.